-
Notifications
You must be signed in to change notification settings - Fork 15
arc: emit clobber of CC for -mcpu=em x >> 31 #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks fine. Let's improve the commit message a bit. Help the reviewer understand what you did.
If the target has no barrel shifter, clobber is now emitted in a parallel RTL expression. si3_loop matches the parallel expression and emits the necessary expressions that may or may not clobber the CC. This happens in split1, before registers are assigned and dead stores are removed. If the clobber was unnecessary, its consequences will be removed by those steps.
A bit too verbose IMO. Just describe the problem we're solving:
Devices without a barrel shifter end up using a sequence of
instructions. These can use the condition codes and/or loop count
register, so those need to be marked as 'clobbered'. These clobbers were previously added only after split1, which is too late. This patch adds these clobbers from the beginning, in thedefine_expand.
Explain succinctly the changes you did, why did you merge _nobs and _loop?
si3_cnt1_clobber is placed below all the specific si3_cnt1 patterns.
It matches the si3_loop pattern in the case operand2 has become
const_int 1, and remove the clobber so the specific si3_cnt1 patterns
can be matched.
If si3_cnt1_clobber is placed above these patterns, some si3_cnt
would try to emit extra clobbers to match it instead of their more
specific pattern that match exactly. This would cause clobbers of hard
registers, causing failure.
This is a bit hard to parse. But I don't think you need to mention this. I think the comment you wrote in the code is sufficient.
This, and perhaps a before and after assembly example might help. |
a035cf1 to
b25c113
Compare
Address the issue explained here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120375 Devices without a barrel shifter end up using a sequence of instructions. These can use the condition codes and/or loop count register, so those need to be marked as 'clobbered'. These clobbers were previously added only after split1, which is too late. This patch adds these clobbers from the beginning, in the define_expand. Previously, define_insn_and_split *<insn>si3_nobs would match any shift or rotate instruction and would generate the necessary patterns to emulate a barrel shifter, but it did not have any output assembly for itself. In many cases this would create a loop with parallel clobbers. This pattern is then matched by the <insn>si3_loop pattern. In the no-barrel-shifter.c test tree code: ;; no-barrel-shifter.c:9: int sign = (x >> 31) & 1; _2 = x.0_1 >> 31; in the expand pass becomes the following pattern that matches *lshrsi3_nobs: (insn 18 17 19 4 (set (reg:SI 153 [ _2 ]) (lshiftrt:SI (reg/v:SI 156 [ x ]) (const_int 31 [0x1f]))) "test2.c":9:24 -1 (nil)) This pattern misses the necessary clobbers and remains untouched until the split1 pass. Together with the later branch it becomes ;; no-barrel-shifter.c:9: int sign = (x >> 31) & 1; add.f 0,r0,r0 ;; no-barrel-shifter.c:14: if (mag == 0x7f800000) beq.d .L8 ;; no-barrel-shifter.c:9: int sign = (x >> 31) & 1; rlc r0,0 Leading to an issue: the add.f instructions overwrites CC but beq expects CC to contain an earlier value indicating mag == 0x7f800000. Now, these are combined in define_insn_and_split <insn>si3_loop that is explicitly emitted in the define_expand and already contains the clobbers. This can then be split into another pattern or remain the loop pattern. In the expand pass, the same example now becomes: (insn 18 17 19 4 (parallel [ (set (reg:SI 153 [ _2 ]) (lshiftrt:SI (reg/v:SI 156 [ x ]) (const_int 31 [0x1f]))) (clobber (reg:SI 60 lp_count)) (clobber (reg:CC 61 cc)) ]) "test2.c":9:24 -1 (nil)) Because the correct clobbers are now taken into account, the branch condition is reevaluated by using breq instead of br. ;; no-barrel-shifter.c:9: int sign = (x >> 31) & 1; add.f 0,r0,r0 rlc r0,0 ;; no-barrel-shifter.c:14: if (mag == 0x7f800000) breq r2,2139095040,.L8 Regtested for arc. Co-authored-by: Keith Packard <[email protected]> Signed-off-by: Loeka Rogge <[email protected]>
b25c113 to
3f600ca
Compare
|
I rewrote the commit with a better explanation and some RTL and assembly from the test. I also added keithp (who reported the bug) as co-author. |
luismgsilva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great commit message.
Fixed solution that was merged in #181 but was reverted because it caused issues. It has been built and tested on the jenkins without issues.